31 research outputs found

    Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency

    Full text link
    Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / \epsilon^2) expected time and returns a (1-1/e-\epsilon)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the \Omega(m + n) lower-bound established in previous work (for fixed k, \ell, and \epsilon). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, \epsilon = 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.Comment: Revised Sections 1, 2.3, and 5 to remove incorrect claims about reference [3]. Updated experiments accordingly. A shorter version of the paper will appear in SIGMOD 201

    Quantifying rainfall-derived inflow and infiltration in sanitary sewer systems based on conductivity monitoring

    Get PDF
    Quantifying rainfall-derived inflow and infiltration (RDII) in a sanitary sewer is difficult when RDII and overflow occur simultaneously. This study proposes a novel conductivity-based method for estimating RDII. The method separately decomposes rainfall-derived inflow (RDI) and rainfall-induced infiltration (RII) on the basis of conductivity data. Fast Fourier transform was adopted to analyze variations in the flow and water quality during dry weather. Nonlinear curve fitting based on the least squares algorithm was used to optimize parameters in the proposed RDII model. The method was successfully applied to real-life case studies, in which inflow and infiltration were successfully estimated for three typical rainfall events with total rainfall volumes of 6.25 mm (light), 28.15 mm (medium), and 178 mm (heavy). Uncertainties of model parameters were estimated using the generalized likelihood uncertainty estimation (GLUE) method and were found to be acceptable. Compared with traditional flow-based methods, the proposed approach exhibits distinct advantages in estimating RDII and overflow, particularly when the two processes happen simultaneously

    KIN10 promotes stomatal development through stabilization of the SPEECHLESS transcription factor

    Get PDF
    Stomata are epidermal structures that modulate gas exchanges between plants and the atmosphere. The formation of stomata is regulated by multiple developmental and environmental signals, but how these signals are coordinated to control this process remains unclear. Here, we showed that the conserved energy sensor kinase SnRK1 promotes stomatal development under short-day photoperiod or in liquid culture conditions. Mutation of KIN10, the catalytic α-subunit of SnRK1, results in the decreased stomatal index; while overexpression of KIN10 significantly induces stomatal development. KIN10 displays the cell-type-specific subcellular location pattern. The nuclear-localized KIN10 proteins are highly enriched in the stomatal lineage cells to phosphorylate and stabilize SPEECHLESS, a master regulator of stomatal formation, thereby promoting stomatal development. Our work identifies a module links connecting the energy signaling and stomatal development and reveals that multiple regulatory mechanisms are in place for SnRK1 to modulate stomatal development in response to changing environments

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    Algorithms for influence maximization and seed minimization

    No full text
    Graph is a basic mathematical tool that models information about identities as well as their complex relationships from various real world problems. It has been found important applications in analysis on social networks, route planning, telecommunication, etc. In recent years, the complexity and scale of real world graphs have increased dramatically. In particular, international social networks can comprise of hundreds of millions of users and up to billions of relationships. Thus, even algorithms with decent time or space complexities meet challenges dealing with large-scale networks for the queries like influence maximization and seed minimization on social networks. In this thesis, we investigate two problems on large scale networks for aforementioned applications, i.e., influence maximization and seed minimization. Given G a social network, M a probabilistic propagation model, and a small number k > 0, the influence maximization problem aims to find the largest expectation of the number of influenced nodes that k nodes can trigger under this pre-defined model M. This problem is derived from viral marketing, where a company gives away free samples to a small number of influential individuals in order to create a cascade of adoption via word-of-mouth effect. This study proposes a two-phase approach Influence Maximization via Martingales (IMM) that meets both practical efficiency and theoretical guarantees. In particular, IMM returns an (1 − 1/e − ε)-approximate solution with at least 1 − n ^(−ℓ) probability in an O((k+ℓ)(n+m)logn/ε^2 ) running time. IMM is further extended to fit in triggering model and time-continuous model. We experimentally evaluate IMM with the state-of-the-art benchmarks under several diffusion models and parameter settings, using large networks with up to 1.4 billion edges. The experimental results show that our approach consistently outperforms the states of the art in terms of efficiency. The seed minimization problem is a variant problem of the influence maximization with the same origin from advertising. Given a social network G and a covering threshold t, the seed minimization problem is aimed to find a seed set S that has an expected influence nodes not less than t·n and minimizes the size of S. Compared to the influence maximization that maximizes the influence given a certain budget, the seed minimization problem hopes to retrench the expense to the minimum number while keeping the influence above a predefined threshold. To solve the problem, we propose GSM, a greedy algorithm with tight approximations, high generalization and easy implementations. In particular, it yields a ⌈(1 + ϕ)log(tn)⌉-approximate solution with at least 1 − n ^(−ℓ) probability, where ℓ and ϕ are both tunable. We experimentally evaluate GSM in several settings of both t and β, and it is often orders of magnitude faster compared to the traditional greedy benchmark MINTSS. GSM also gives an impressive performance on a large graph Twitter with more than a billion edges.Master of Engineerin

    Probabilistic Caching Placement in the Presence of Multiple Eavesdroppers

    No full text
    The wireless caching has attracted a lot of attention in recent years, since it can reduce the backhaul cost significantly and improve the user-perceived experience. The existing works on the wireless caching and transmission mainly focus on the communication scenarios without eavesdroppers. When the eavesdroppers appear, it is of vital importance to investigate the physical-layer security for the wireless caching aided networks. In this paper, a caching network is studied in the presence of multiple eavesdroppers, which can overhear the secure information transmission. We model the locations of eavesdroppers by a homogeneous Poisson Point Process (PPP), and the eavesdroppers jointly receive and decode contents through the maximum ratio combining (MRC) reception which yields the worst case of wiretap. Moreover, the main performance metric is measured by the average probability of successful transmission, which is the probability of finding and successfully transmitting all the requested files within a radius R. We study the system secure transmission performance by deriving a single integral result, which is significantly affected by the probability of caching each file. Therefore, we extend to build the optimization problem of the probability of caching each file, in order to optimize the system secure transmission performance. This optimization problem is nonconvex, and we turn to use the genetic algorithm (GA) to solve the problem. Finally, simulation and numerical results are provided to validate the proposed studies

    Rubber Identification Based on Blended High Spatio-Temporal Resolution Optical Remote Sensing Data: A Case Study in Xishuangbanna

    No full text
    As an important economic resource, rubber has rapidly grown in Xishuangbanna of Yunnan Province, China, since the 1990s. Tropical rainforests have been replaced by extensive rubber plantations, which has resulted in ecological problems such as the loss of biodiversity and local water shortages. It is vitally important to accurately map the rubber plantations in this region. Although several rubber mapping methods have been proposed, few studies have investigated methods based on optical remote sensing time series data with high spatio-temporal resolution due to the cloudy and foggy weather conditions in this area. This study presented a rubber plantation identification method that used spatio-temporal optical remote sensing data fusion technology to obtain vegetation index data at high spatio-temporal resolution within the optical remote sensing window in Xishuangbanna. The analysis of the proposed method shows that (1) fused optical remote sensing data with high spatio-temporal resolution could map the rubber distribution with high accuracy (overall accuracy of up to 89.51% and kappa of 0.86). (2) Fused indices have high R2 (R2 greater than 0.8, where R is the correlation coefficient) with the indices that were derived from the Landsat observed data, which indicates that fusion results are dependable. However, the fusion accuracy is affected by terrain factors including elevation, slope, and slope aspects. These factors have obvious negative effects on the fusion accuracy of high spatio-temporal resolution optical remote sensing data: the highest fusion accuracy occurred in areas with elevations between 1201 and 1400 m.a.s.l., and the lowest accuracy occurred in areas with elevations less than 600 m.a.s.l. For the 5 fused time series indices (normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), normalized difference moisture index (NDMI), normalized burn ratio (NBR), and tasseled cap angle (TCA)), the fusion accuracy decreased with increasing slope, and increasing slope had the least impact on the EVI, but the greatest negative impact on the NDVI; the slope aspect had a limited influence on the fusion accuracies of the 5 time series indices, but fusion accuracy was lowest on the northwest slope. (3) EVI had the highest accuracy of rubber plantation classification among the 5 time series indices, and the overall classification accuracies of the time series EVI for the four different years (2000, 2005, 2010, and 2015) reached 87.20% (kappa 0.82), 86.91% (kappa 0.81), 88.85% (kappa 0.84), and 89.51% (kappa 0.86), respectively. The results indicate that the method is a promising approach for rubber plantation mapping and the detection of changes in rubber plantations in this tropical area

    Short-chain fatty acid (SCFA) production maximization by modeling thermophilic sludge fermentation

    No full text
    Producing more versatile green chemicals such as short-chain fatty acids (SCFAs) from waste activated sludge (WAS) via fermentation is a promising approach in wastewater treatment. In this study, we investigated how SCFA production can be maximized within wastewater treatment plants. An ordinary differential equation model was devised that encompasses organic inputs to a reactor as well as acetogenic and methanogenic bacterial populations. The model was calibrated and validated on an independent set of data during thermophilic sludge fermentation. A series of experiments were performed to determine the effects of reactor parameters, such as organic inputs and solid retention time (SRT), on SCFA production. Model simulation results show that optimization of SRT plays an important role in improving SCFA production. SCFA production can be enhanced with an increase in biodegradable particulate organic matter, an increase in acidogenic bacteria or a decrease in methanogens in the feed sludge. The maximum SCFA yield has been proven to benefit from thermophilic fermentation at a temperature of up to 50 °C, in which the maximum SCFA yield reaches 18% per VSS in terms of COD. The model predictions indicated that a high ratio of acidogenic bacteria to methanogens (i.e., 2 : 1) in the WAS is critical to achieve a maximum yield of approximately 30%
    corecore